Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method
نویسندگان
چکیده
We propose a new fast Gaussian summation algorithm for high-dimensional datasets with high accuracy. First, we extend the original fast multipole-type methods to use approximation schemes with both hard and probabilistic error. Second, we utilize a new data structure called subspace tree which maps each data point in the node to its lower dimensional mapping as determined by any linear dimension reduction method such as PCA. This new data structure is suitable for reducing the cost of each pairwise distance computation, the most dominant cost in many kernel methods. Our algorithm guarantees probabilistic relative error on each kernel sum, and can be applied to high-dimensional Gaussian summations which are ubiquitous inside many kernel methods as the key computational bottleneck. We provide empirical speedup results on low to high-dimensional datasets up to 89 dimensions. 1 Fast Gaussian Kernel Summation In this paper, we propose new computational techniques for efficiently approximating the following sum for each query point qi ∈ Q:
منابع مشابه
Ultrafast Monte Carlo for Kernel Estimators and Generalized Statistical Summations
Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Kernel estimators and other methods are burdened by these expensive computations. Exact evaluation is typically O(n) or higher, which severely limits application to large datasets. We present a multi-stage stratified Monte Carlo method for approximating such summations with probabilistic rel...
متن کاملLekner summations and Ewald summations for quasi-two dimensional systems
Using the specific model of a bilayer of classical charged particles (bilayer Wigner crystal), we compare the predictions for energies and pair distribution functions obtained by Monte Carlo simulations using three different methods available to treat the long range Coulomb interactions in systems periodic in two directions but bound in the third one. The three methods compared are: the Ewald m...
متن کاملASKIT: An Efficient, Parallel Library for High-Dimensional Kernel Summations
Kernel-based methods are a powerful tool in a variety of machine learning and computational statistics methods. A key bottleneck in these methods is computations involving the kernel matrix, which scales quadratically with the problem size. Previously, we introduced ASKIT, an efficient, scalable, kernel-independent method for approximately evaluating kernel matrix-vector products. ASKIT is base...
متن کاملMonte Carlo simulations for clutter statistics in minefields: AP-mine-like-target buried near a dielectric object beneath 2-D random rough ground surfaces
A rigorous three-dimensional (3-D) electromagnetic model is developed to analyze the scattering from anti-personnel (AP) nonmetallic mine-like target when it is buried near a clutter object under two-dimensional (2-D) random rough surfaces. The steepest descent fast multipole method (SDFMM) is implemented to solve for the unknown electric and magnetic surface currents on the ground surface, on ...
متن کاملUltrafast Monte Carlo for Statistical Summations
Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Computation of these summations is typically O(n) or higher, which severely limits application to large datasets. We present a multistage stratified Monte Carlo method for approximating such summations with probabilistic relative error control. The essential idea is fast approximation by sam...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008